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THE  RELATIONSHIP  OF  THE  METHOD  OF  GRAPHIC  CORRELATION 
TQ  LEAST  SQUARES 

By  Richard  J.  Foots  and  J.  Russell  Ives  !_/ 


INTRODUCTION 


Considerable  use  has  been  made  of  the  graphic  method  of  correlation 
during  the  past  10  years,  particularly  in  the  field  of  agricultural  economics. 
The  method  has  been  popular  for  several  reasons:  (l)  It  indicates  graphi¬ 
cally  the  "net"  2/  relationships  between  the  variables  included  in  the  study, 
(2)  it  has  been  thought  to  be  an  exceedingly  simple  and  flexible  method  of 
studying  curvilinear  relationships,  (3)  it  is  a  relatively  simple  mechanical 
substitute  for  mathematical  calculations. 

Although  it  is  now  more  than  10  years  since  L.  H.  Bean’s  article  on 
the  graphic  method  of  correlation  was  published  3 /,  some  confusion  still 
exists  among  the  users  of  the  method  regarding  the  correct  interpretation  of 
the  results  obtained  in  terms  of  standard  mathematical  coefficients.  It  is 
the  purpose  of  this  paper  to  examine  the  meaning  of  the  various  steps  in¬ 
volved  in  the  graphic  correlation  procedure  from  the  point  of  view  of  the 
Least  Squares  method. 

T/  The  authors  wish  to  acknowledge  the  encouragement  given  them  by  F.  L. 
Thomsen,  Bureau  of  Agricultural  Economics,  and  G.  S.  Shepherd,  Iowa  State 
College,  in  the  development  of  this  paper.  Suggestions  and  criticisms  from 
A.  Sturgrs,  Bureau  of  Labor  Statistics,  and  M.  A.  Girshick,  Bureau  of 
Agricultural  Economics,  have  been  particularly  helpful.  Mr.  Girshick  out¬ 
lined  many- of  the  mathematical  proofs,  and  materially  assisted  in  the 
preparation  of  the  manuscript. 

2/  This  term  has  been  used  in  many  discussions  of  the  graphic  methods.  Its 
meaning  will  be  explained  in  the  body  of  the  paper. 

3/  "A  Simplified  Method  of  Graphic  Curvilinear  Correlation".  Journal  of  the 
American  Statistical  association,  24:  3S6-97>  December  1923.  (A  mimeographed 
publication  containing  essentially  the  same  material  and  issued  by  the 
Bureau  of  Agricultural  Economics  is  no  longer  available  except  in  libraries.) 
The  method  also  is  outlined  in  several  textbooks  including  M.  Ezekiel, 

Methods  of  Correlation  Analysis,  John  Nilcy  and  Sons,  Inc.,  New  York,  1930, 
and  F.  L.  Thomsen,  Agricultural  Prices,  McGraw-Hill  Book  Co.,  Inc.,  New  York 
and  London,  1936. 


This  article  deals  with  the  graphic  method  as  applied  to  linear  re¬ 
lationships  or  relationships  which  can  be  made  linear  by  transformation*  4/ 

A  brief  summary  of  the  conclusions  presented  is  as  f ollows : 

The  method  of  linear  graphic  multiple  correlation  suggested  by  L.  H* 
Bean  essentially  is  based  upon  two  mathematical  principles:  (1)  The  multiple 
regression  equation  becomes  the  equation  cf  a  curve  when  all  of  the  independ¬ 
ent  variables  except  one  are  held  constant.  In  the  case  of  linear  regression, 
the  curve  is  a  straight  line  whose  slope  is  equal  to  the  partial  regression 
coefficient  between  the  dependent  variable  and  that  independent  variable  which 
is  permitted  to  vary.  For  this  reason  the  slopes  of  the  drift  lines  indicate 
the  partial  regression  coefficient.  (2)  The  method  of  successive  approxima¬ 
tion  as  outlined  by  Bean  is  analogous  to  a  mathematical  iterative  process 
which  converges  to  the  Least  Squares  solution.  Thus,  even  if  an  error  is  made 
in  the  first  approximations  to  the  regressions,  succeeding  approximations  will 
tend  to  jrield  more  and  more  accurate  results.  However,  the  speed  of  con¬ 
vergence  depends  chiefly  on  the  size  of  the  error  in  the  first  approximation 
end  the  size  of  the  correlation  between  the  independent  variables.  The 
better  the  first  approximation  and  the  smaller  the  intercorrelation,  the 
faster  will  the  process  tend  to  converges  The  sizes  of  the  intercorrela- 
tions  are  determined  by  the  nature  of  the  variables  included  in  the  analysis 
and  hence,  once  the  variables  are  chosen,  very  little  can  be 

V  Although  the  graphic  method  was  developed  primarily  to  handle  curvilinear 
rather  than  linear  relationships,  only  the  linear  case  is  discussed  here  for 
the  following  reasons:  (l)  by  implication,  the  form  of  the  relationship  is 
given  a  priori;  (2)  the  Least  Squares  method,  as  usually  considered,  is  ap¬ 
plicable  mainly  to  linear  relationships.  A  discussion  of  the  graphic  method 
as  applied  to  curvilinear  relationships  would  be  complicated  by  the  fact  that 
its  counterpart  to  the  mathematical  method  would  be  difficult  to  exhibit. 
Consequently  any  attack  on  the  problem  of  the  graphic  method  as  applied  to 
curvilinear  relationships  must  be  made  experimentally.  Such  an  experiment 
has  been  started  in  the  Division  of  Statistical  and  Historical  Research. 
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done  graphically  to  speed  up  the  convergence.  However,  the  accuracy  of  the 

first  approximations  may  he  greatly  enhanced  by  the  use  of  drift  lines* 

Mathematical  Meaning  of  Partial  Regression. 

The  present  discussion  will  deal  with  regression  analyses  as  a  method 

of  estimating  relationships  between  variables  for  purposes  of  prediction. 

The  usual  situation  in  such  cases  is  as  follows: 

5/ 

A  set  of  Ifp  quantities  X^,  Xg^,  . .  ,X  v(  i=l>  2,  ««.  N)  are  considered. 

It  is  assumed  that  (1)  for  all  values  of  i,  is  a  chance  variable  6/  and 

the  quantities  Xpp,  , .,  X  ^  (i  =  1,  2,  ...  1\T )  are  known  constants.  (2)  The 
mean  value  of  X-,  .  is  a  linear  combination  of  X0 . ,  , ,,  Xpi  with  unknown 
coefficients.  Thus 

mean  value  of  X-^  =  b-^  +  b^Xq^  +  +  b . X  ^  (i  =  1,  2,  ...  N) 

where  the  b’s  are  unknown  constants.  (3)  The  variate  X-^  is  uncorrelated 

with  X,  .  for  i  /  j  , 

Under  the  above  assumptions,  it  can  be  shown  that  the  best  (linear) 
estimate  of  the  constants  bp,  bp,  ...  b  is  given  by  minimizing  the  quantity 

1  2 

/ru  -  (bi  +  bg  x21  +  ...+  bp  xp.)_7  . 

i  =  1 

Here  the  tern  "best"  is  used  in  the  sense  that  the  estimates  of  the  b's 
obtained  in  this  way  mil  have  the  smallest  standard  errors. 

If  in  the  equation 

Xi  =  bx  +  b2  X2  +  b3  X3  +  ...  +  bp  X,,  (1) 

constant  values  were  assigned  to  X„,  ...  X^,  then  bgX^  +  ...  +  bpXp' would  be 

equal  to  some  constant  which  could  be  combined  with  the  constant  bg  to  give 

JT7  The  first  subscript  refers  to  the  type  of  variable,  i.e.  price,  production, 
etc.,  and  the  second  subscript  refers  to  the  observation. 

6/  Roughly  speaking  this  means  that  a  variety  of  uncontrollable  forces  oper¬ 
ate  to  give  different  values  of  Xqq,  each  value  occurring  with  a  certain, 
though  possibly  unknown,  frequency. 
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a  new  constant  K,  Equation  (1)  could  then  be  written  as 
Xx  =  K  +  b2X2  (2) 

which  is  the  equation  of  a  straight  line  having  a  slope  equal  to  bg,  Here 
b2?  which  nay  be  written  as  b-j2  ^  ,  is  the  regression  of  Xq  on  X2  when 

Xg,...  Xp  are  constant. 

If  two  or  more  observations  in  a  scatter  diagram  of  X-j  on  X2  had  the 
same  or  approximately  the  same  value  of  Xg,...  Xp  then  an  estimate  of 
bl2.3...p  coubb  be  obtained  by  drawing  a  best  fitting  line  through  them. 

If  this  process  were  repeated  for  several  groups  of  points  having  the  same 
Xg,...  Xp  values,  several  lines  whose  slopes  are  estimates  of  the  same 
partial  regression  coefficient,  b,0  „  ,  would  be  obtained  7/,  Such  lines 

have  been  called  "drift  lines"  and  their  use  is  an  integral  part  of  the 
graphic  method. 

GRAPHIC  METHOD  AS  APPLIED  TO  A  IIHEAR  TI®.EE  VARIABLE  PROBLEM  s/ 

Step  ■  JL.  The  scatter  diagram 

The  first  step  in  the  graphic  method  of  correlation  as  applied  to  a 
linear  3-variable  problem  (data  in  table  1)  is  the  plotting  of  the  dependent 
variable,  Xq,  against  one  of  the  independent  variables,  say  X2,  in  an 
or  dinary  scatter  diagram  (dots  in  section  A,  fig.  1).  Since  succeeding 
steps  in  the  process  are  carried  out  in  terms  of  vertical  deviations,  the 
dependent  variable,  Xq,  is  represented  along  the  vertical  scale  of  the  chart 
and  the  independent  variable,  X2,  along  the  horizontal  scale.  It  is  es¬ 
sential  that  each  observation,  that  is*,1  ‘each  dot,  on  this  and  following 

77“T  hT  process  -of  obtaining  estimates  of  bq2  g  ^  from  the  slope  of  these 
lines  is  equivalent  to  breaking  the  total  sample  into  selected  sub-samples 
and  obtaining  from  each  of  these  an  independent  estimate  of  b12.3...p  • 

_8/  Much  of  the  material  given  in  the  next  5  pages  has  been  stated  by  Bean, 
Ezekiel  and  others.  It  is  presented  here  as  a  background  for  the  discussion 
which  follows. 
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charts  he  labeled  so  that  the  corresponding  observation  for  each  variable 
may  be  identified  throughout  the  process.  As  it  stands,  without  further 
analysis,  the  scatter  of  the  two  variables  plotted  in  this  first  section 
indicates  the  simple  correlation  between  Xq  and  Xp  ( rq  p ) . 

Table  1.-  Data  used  in  figures  1  and  3 


Observation)  Xq 

t 

• 

• 

;  *2 

• 

• 

;  x3 

• 

• 

1 

52 

50 

60 

O 

CL 

‘  20 

38 

24 

3 

62 

30 

50 

4 

30 

42 

26 

5 

3b 

50 

44 

6 

4o 

40 

4q 

7 

42 

2.4 

26 

8 

50 

38 

30 

9 

20 

4S 

22 

10 

60 

42 

38 

11 

22 

30 

20 

Mean 

39.4 

39.3 

38.2 

1/  Using 

graphic  method  for  obtaining 

Estimated 

% 

From  mathematically; 
calculated  b’ s  ; 

From  2nd 

approximations  1/ 

53.9 

54.5 

25.6 

25.7 

88.8 

57.2 

24.7 

25.5 

37-2 

38.2 

4o.g 

40.7 

38.4 

37.0 

52.7 

51. 8 

lp.9 

17.5 

88.0 

57.5 

27.8 

27.O 

39-4 

39.3 

estimated  Xq  values. 


Stop  2^  The  drift  lines 

As  many  drift  lines  as  possible  are  next  drawn  in  this  section. 

For  example,  from  table  1  it  will  be  seen  that  the  pair  of  observations 
(3,  8),  which  have  been  connected  by  light  lines  in  section  A,  -are  those 
having  identical  Xp  values.  Likewise,  observations  4,  7;  1>  10;  and  2,  9> 
11  have  approximately  the  same  Xq  values  and  have  been  connected  with  light 
lines. 


Step  3.  The  first  approximat ion  to  bq ?  3 

The  next  step  is  to  draw  through  the  means  of  Xq  and  Xp  in  Section 
A  a  line  having  a  slope  equal  to  the  average  slope  of  all  of  the  drift  linos. 


SECTION  A  SECTION 


6- 


Figure  1.-  Graphic  determination  of  first  and  second  approximations  to  partial  regressions  for 
three  variable  problem. 
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The  slope  of  this 'line  is  the  first  approximation  to  the  partial  regression, 
^1-2.3  >  the  relation  between  and  X2  when  X-,  is  constant.  The  degree 

with  which  thus"  average' slope  approximates  the  regression  b]_2#3  will  depend 
on  the  stability  of  the  slopes  of  the  individual  drift  lines.  In  general, 
the  amount  of  fluctuation  that  may  be  expected  in  the  slopes  of  the  drift 
lines  will  depend  on  (l)  the  number  of  observations  that  have  the  same  or 
approximately  the  same  value  for  X3  on  the  basis  of  which  any  one  of  these 
drift  lines  are  estimated,  (2)  the  original  variability  in  X2  ,  (3)  the 
partial  correlation  of  Xx  on  X2  when  X3  is  constant. 

It  is  not  essential  to  the  mechanical  accuracy  of  the  process  or  to 
the  statistical  meaning  of  the  method  that  the  regression  line  pass  throiogh 
the  means  of  the  two  series  of  data.  This  was  done  for  simplicity.  In  a 
later  section  the  modification  introduced  by  not  passing  the  regression 
line  through  the  means  will  be  explained. 

It  should  be  pointed  out  that  the  accuracy  of  the  first  approxima¬ 
tion  should  not  be  .judged  by  the  goodness  of  fit  of  the  average  drift  line 
to  the  scatter  of  X2  on  X2.  This  follows  from  the  fact  that  b-^o  ^  is  a 
partial  regression  and  will  equal  the  simple  regression  b-^2  only  when  the 
correlation  between  X2  and  X3  is  zero  9/ >  which  in  a  sample  is  an  unlikely 
occurrenc e . 

Step  4.  Plotting  deviations  from  the  average  drift  line  against  X3 

Having  obtained  an  approximation  to  b^2  the  next  step  in  the 
graphic  procedure  is  the  construction  of  a  second  scatter  diagram  in  which 
97  This  is  apparent  from  the  f 0 rmul a 

b12.3  =  ^12  ~  ^13  ^32 

1  -*b23  b32 

which  reduces  to  b12<3  -  b12  when  r23  -  0. 
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the  vertical  (perpendicular  to  the  Xg- axis,  not  to  the  regression  line) 
deviations  of  the  observations  in  section  A  from  the  average  drift  line, 
that  is,  the  first  approximation  to  the  line  with  slope  b]g  are  plotted 
against  the  second  independent  variable,  X^  (section  B,  fig*  1)*  A  con¬ 
venient  method  for  plotting  deviations  is  to  take  a  small  card  or  piece  of 
paper,  draw  in  a  zero  line  near  the  center  of  one  side,  and  then  take  off 
the  plus  and  minus  vertical  deviations  by  moving  the  zero  line  on  the  card 
along  the  regression  in  the  first  chart  and  marking  off  the  distance  for 
each  observation.  If  the  average  drift  line  passes  through  the  means,  then 
the  sum  of  these  deviations  will  be  approximately  zero. 

A  horizontal  line,  representing  the  zero  value  for  the  deviations,  is 
drawn  through  the  center  of  the  second  chart.  The  exact  location  of  this 
line  and  the  range  for  the  vertical  axis  may  be  easily  obtained  by  placing 
the  marked  card  on  the  chart  and  observing  the  maximum  plus  and  minus 
deviations.  Then  the  zero  line  on  the  card  is  moved  along  the  zero  line  on 
the  chart  and  the  deviation  for  each  observation  is  inserted  above  the  Xg 
value  for  that  observation.  An  alternative  method  is  to  use  dividers  to 
transfer  the  deviations  from  one  chart  to  the  other.  This  completes  the 
plotting  of  the  second  scatter  diagram  and  the  best  visual  estimate  of  the 
simple  regression  for  these  dots  may  be  fitted  to  this  scatter,  lo/ 

In  fitting  simple  regressions  graphically,  it  must  be  remembered  that 
the  sum  of  squares  of  the  deviations  should  be  reduced  to  a  minimum.  Thus, 
in  general,  more  weight  should  be  given  to  large  deviations  than  to  small 
ones.  For  example,  suppose  two  observations  deviated  from  one  guessed  re¬ 
gression  by  0,4  and  1,0  units,  respectively,  and  from  a  second  guessed 

127  ^  would  be  possible  to  use  drift  lines  based  on  constant  values  of  Xg 

to  indicate  the  slooe  of  this  line.  The  advisability  of  using  this  method 
will  be  discussed  in  a  later  section. 
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regression  by  0.6  and  Cv3  units. 


Although  the  sun  of  the  absolute  deviations 


is  the  same  for  the  two  regressions,  the  sum 
is  smaller  for  the  latter,  and  consequently, 
the  first  at  least  so  far  as  these  two  points 
the  analyst  mil  be  able  to  approximate  such 


of  the  squares  of  the  deviations 
the  second  line  is  preferable  to 
are  concerned.  With  practice, 
a  regression  closely.  If  the 


first  regression  was  put  through  the  means  of  X-^  and  Xg,  the  second  regres¬ 
sion  should  pass  through  the  moan  of  Xg  at  the  zero  line. 

If  the  deviations  from  the  regression  in  the  first  chart  are  consider¬ 
ed  as  a  new  variable  Vq,  so  that 


V1  =  X1  “ 


b12.3X2 


(3) 


and  if  the  first  approximation  ta  g  is  equal  to  the  mathematically  cal¬ 
culated  b,  o  „  for  the  samnle  of  data,  then  it  can  easily  be  shown  that  the 
simple  regression  between  and  is  equal  to  bv^?,  that  is,  the  regres¬ 
sion  between  Xq  ai  d  Xg  when  Xg  is  constant •  11/  If  the  first  approximation  to 
bq2 . 3  ^oes  n°t  equal  the  mat  hematic  ally  calculated  bq?  g,  then  the  regression- 
in  the  second  chart  may  not  equal  b-^  ^ .  However,  the  next  section  shows  how 
the  first  approximations  may  be  adjusted  so  that  they  will  more  nearly  equal 
the  mathematically  calculated  regression  coefficients,  b-.0  *  and  b,  v  0 . 


Step  5.  The  process  of  successive  approximation 

The  mathematical  process  of  successive  approximation  is  a  systematic 
method  of  finding  the  linear  regressions  in  section  A  and  section  B,  which  re¬ 
duce  the  sum  of  squares  of  the  deviations  from  the  regression  in  section  B  to 
a  minimum.  Since  the  graphic  method  duplicates  the  steps  of  the  mathematical 

method,  successive  corrections  in  the  first  approximations  to  b-,  0  „  and  b-  „  _ 

~  it  .o  io  .2 

will  tend  to  approach  the  mathematically  calculated  values. 

11/  See  note  1  in  the  Appendix. 
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The  method  may  be  described  mathematically  as  follows:  Some  value  is 


chosen  as  a  first  approximation  to  b.  ^  ^  .12/  This  may  be  called  b^? 


12.3' 


Deviations  of  X-,  from  a  regression  with  this  slope  are  related  to  X^  and  a  re¬ 
gression  obtained  such  that  the  sum  of  squares  of  the  deviations  about  it  is 
reduced  to  a  minimum.  The  slope  of  this  second  regression  may  be  called 
b, Deviations  of  Xn  from  the  line  with  slope  bn  i  l  are  then  related  to 

id  t  n  i  -no  •  6 

Xg  and  a  regression  obtained  such  that  the  sum  of  squares  of  the  deviations 
about  it  is  reduced  to  a  minimum.  The  slope  of  this  regression  may  be  called 


(2) 


Deviations  of  X-,  from  the  line  with  slooe  b- 


(2) 


are  then  related  to 


i  J-%  ry  •  juviauxuuo  .l  Ai  i  x  Wii;  u»  i ^  u.'i.  uu  o  u-,  r 

12.3  i  -  ±6.0 

Xg  and  a  regression  again  obtained  such  that  the  sum  of  squares  of  the  devia¬ 
tions  about  it  is  reduced  to  a  minimum.  The  slope  of  this  regression  may  be 
(2) 

called  b-j^  9.  It  can  be  shown  that  if  this  process  is  continued,  each  succeec 


ing  approximation  will  be  nearer  the  mathematically  calculated  value  than  the 

(n] 


( n )  ( n ) 


preceding  one  and  b-,  „  '  and  b  '*  .  can  be  made  to  come  as  close  to  ^  and 

±6.0  lo*2  1.6.0 

^13  2  as  ^esined  if  n  is  r:‘ia<ie  large  enough,  that  is,  if  enough  successive  ap¬ 
proximations  are  made.  13/  The  factors  affecting  the  speed  of  convergence 
will  be  discussed  as  soon  as  the  graphic  equivalent  of  this  method  has  been 
described. 


The  steps  in  the  graphic  method  discussed  in  preceding  sections  have 
given  first  approximations  to  b  and  b  ,  the  former  being  given  by  the 
average  slope  of  the  drift  lines  ar.d  the  latter  by  the  simple  regression  be¬ 
tween  V-  and  Xg.  The  next  step  is  to  take  the  vertical  deviations  from  the 
regression  in  the  second  chart  (fig.  1,  section  B)  and  plot  them  about  the 

12/  In  the  graphic  method  this  value  is  often  chosen  on  the  basis  of  drift 
lines  as  discussed  in  step  3. 

13/  See  note  2  in  the  Appendix.  In  the  remainder  of  the  discussion  the  nota¬ 
tions,  b-j-g  g  and  b^  9,  will  be  used  only  for  the  mathematical  values  obtained 

by  the  usual  methods,  and  a  superscript  will  be  given  if  the  value  was  obtain¬ 
ed  by  some  method  of  successive  approximation. 
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average  drift  line,  that  is,  the  line  with  slope  b in  the  firs"t  chart  so 
that  the  deviation  for  each  observation  is  directly  above  or  below  the  origi¬ 
nal  dot  for  that  observation,  that  is,  retains  the  sane  Xp  value*  It  is 
advisable  to  use  colored  pencils  for  this  purpose  so  that  the  deviations  can 
be  plotted  in  s.  different  color  than  were  the  original  observations.  For 
purposes  of  discussion,  however,  it  will  be  assumed  that  the  original  observa¬ 
tions  are  indicated  by  black  dots  and  the  deviations  from  the  regression  line 
in  section  B  by  circles*  Bext,  a  simple  regression  is  drawn  through  the 
means  of  X-^  and  X9  and  the  circles  in  such  a  wrav  that  it  appears  to  reduce 
the  scatter  about  itself  to  a  minimum.  This  is  the  dashed  line  in  section  A 


(2) 

and  gives  the  second  approximation,  b 

id  *o 


(2) 


It  should  be  noted  that  the  slope  of  the  new  regression,  b  '  ' ,  is 
exactly  equal  to  the  slope  that  would  be  obtained  if  X^  wrere  plotted  against 
X  ,  a  regression  drawn  in  having  a  slope  equal  to  b-,^^,  deviations  from  this 
regression  plotted  against  Xg,  and  a  simple  regression  drawn  in.  The  scatter 
about  such  a  regression  wrould  also  be  exactly  equal  to  the  scatter  of  the 
circles  about  the  dashed  line  in  the  first  chart  (fig.  1,  section  A). 

This  point  is  so  fundamental  to  the  understanding  of  the  graphic  method 
that  it  seems  advisable  to  give  the  proof  here.  The  deviations  from  the  line 
Width  slope  b,  9  i  in  the  first  chart  are  given  bv 

V  -  T  ^  (1)  V 
1  1  12. S  “2 


(Equation  3) 


Since  in  the  second  chart  Vy  is  related  to  X^,  deviations  from  the  line 

(1) 

with  slope  b..  „  _  in  this  chart  are  pdven  bv 

lc.2  ° 


V  =  v  -u  (1) 

-  y  _  b  CD  ~  .  (1)  x 

1  b12.3  JV2  b13.2  X3 


(4) 


-  12 


But  the  circles  in  the  first  chart  are  equal  to  the  regression  co¬ 
efficient  b^g^g*  times  Xg  for  that  observation,  plus  Vg,  so  that 

(1) 

Circles  =  bqg  3  Xg  +  Vg 


(1) 


(1) 


(1) 


b12.3  X2  +  X1  “  b12.3  X2  ”  bl3.2  X3 
=  X1  b13.2  X3 


Thus,  the  results  are  the  same  as  those  which  would  have  been  obtained 
if  the  value  of  Xq  -  bqi^g  Xg  had  been  obtained  in  a  separate  chart  and  re¬ 
lated  to  Xg  in  a  separate  chart*  In  other  words,  the  circles  and  the  dashed 
line  in  section  A  give  exactly  the  same  picture  as  would  be  given  in  a 
second  chart,  with  the  independent  variables  reversed,  if  the  longer  method 
had  been  used. 

Now  that  b-^2  '  has  been  obtained,  devis.tions  of  the  circles  from  the 

line  with  slope  bq^^  (the  dashed  line)  are  plotted  about  the  line  with  slope 

(1) 

b13.2  sec^ion  B  as  circles,  keeping  the  same  X^  values.  Then  a  new  re¬ 
gression  is  drawn  in  on  this  chart  (the  dashed  line)  such  that  the  sum  of 

squares  of  the  deviations  of  the  circles  from  it  appears  to  be  reduced  to 

(2  ^ 

a  minimum.  The  slope  of  this  regression  is  b,  \  A.  Deviations  from  this  re- 

( 2  ) 

gression  are  then  plotted  about  the  line  with  slope  bqg  ^  (the  dashed  line  in 

section  A).  These  deviations  may  be  plotted  as  X's  and  a  simple  regression 

(3) 

drawn  through  them  as  a  dotted  line.  The  slope  of  this  regression  is  bqg^g^ 
(in  the  example  used,  the  dotted  line  would  have  coincided  with  the  dashed 

/  r>  \ 

line  and  consequently  was  not  drawn  in.  In  this  instance  bqgog  is  the  final 
approximation.)  The  process  is  once  more  repeated  by  taking  deviations  of  the 

( 3 ) 

X’s  from  the  dotted  line  with  slope  bqg  ^  and  plotting  them  about  the  line 

(2) 

with  slope  bqg  g.  This  process  is  continued  until  no  further  correction  ap- 

f  *n. ) 

pears  to  be  needed  in  bq£ that  is,  until  the  simple  regression  for  the 
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(n) 

deviations  from  the  line  rath  Slope  b-,  „  0  plotted  about  the  line  with  slope 
big11^  coincides  with  the  line  with  slope  b^^* 

The  above  discussion  shows  that  the  method  of  successive  approximation 
as  outlined  by  Bean,  is  a  short-cut  graphic  method  analogous  to  the  mathe¬ 
matical  method  of  successive  approximation  discussed  on  fag®  10 •  14-/ 

Speed  of  Convergence 

The  speed  with  which  the  successive  approximations  lead  to  stable  re¬ 
sults  is  of  interest  for  two  reasons:  (1)  It  takes  time  to  make  successive 
approximations  and  the  charts  become  messy  after  several  sets  of  dots  have 
been  inserted  on  them  and  (2)  if  the  convergence  is  too  slow,  the  analyst  may 
think  that  no  further  correction  is  needed  in  the  line  with  slope  b^g^g  when 
in  reality  its  slope  is  still  quite  different  from  the  mathematically 
calculated  value. 


By  making  algebraic  substitutions  in  the  mathematical  method  of  suc¬ 
cessive  approximation  for  a  three-variable  problem  15/,  it  can  be  shown  that 


bl2.3 


(n) 


2n  -  2 


v  v  (n)_  so  2n  ’  1 

b13.2  ~  b13.2"  -_2_  r?, 
s3  23 


=  r 


2n  -  2 
23 


(b 

(b 

(b 


(!)x 

'12.3  "  b12.3}  and 

(5) 

h  (l)l 
'12,3  “  °12.3; 

(6) 

13.2  13. 2' 

(7) 

1  bo  bl2  3*  b15^2  ds  bbe 

let  h  appr  oxi  ma- 

(k) 

where  b^  is  the  loth  approximat 
tion  to  b-^g#2»  b12.3  bl3,2  are  bbe  ma"thematically  calculated  values  of 

t>12,3  anb  b13,2  respectively,  r^g  is  the  correlation  between  the  independent 


variables  Xg  and  Xg  and  s^  and  Sg  are  the  standard  deviations  of  Xg  and  Xg, 
respectively, 

14/  The'  fact  that  graphically  determined  successive  approximations  to  b^g  g 
va.ll  tend  to  converge  toward  the  mathematically  calculated  value  has  been 
demonstrated  in  specific  cases  by  various  writers  on  the  graphic  method.  How 
ever,  so  far  as  is  known,  no  mathematical  discussion  of  this  convergence  or 
of  the  factors  affecting  the  speed  of  convergence  has  been  presented, 

15/  See  note  2  in  the  Appendix, 
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\  1  , 

Equation  (5)  states  that  the  difference  "between  the  mathematically 
calculated  bqg#3  and  any  given  approximation  is  equal  to  a  function  of  the 
correlation  between  the  independent  variables  times  the  error  which  was  ma.de 
in  the  first  approximation  to  b-^g  g.  It  shows  that  the  higher  the  correlation 
between  the  independent  variables,  the  slower  will  be  the  speed  of  converg¬ 
ence  •  16/ 

Equation  (5)  also  indicates  the  importance  of  the  drift  lines,  since 
if  the  error  in  bJ  ^  is  small,  one  or  two  iterations  may  be  enough,  but  if 
the  error  is  large  and  the  correlation  between  the  independent  variables  5s 
also  large,  6  or  8  or  even  more  successive  approximations  may  be  required  to 
bring  the  slope  of  the  regression  to  within  0.1  of  the  correct  value. 

As  mentioned  in  footnote  10,  drift  lines  could  be  used  in  the  second 
chart  as  well  as  in  the  first.  It  should  be  borne  in  mind,  however,  that  the 
function  of  the  drift  lines  is  to  give  a  good  "guessed"  value  for  b^g  g  or 
^13.2*  Wither  they  should  be  used  in  both  charts  or  in  only  one  would  de¬ 
pend  upon  the  availability  of  drift  lines  in  sufficiently  large  numbers  to 
give  good  approximations  to  the  regression  lines.  It  might  be  well  to  use 
them  in  obtaining  first  approximations  to  both  b^  g  and  b-^g  ^  and  "then  to  use 
the  method  of  successive  approximation  as  outlined  above  in  the  remainder  of 
the  analysis. 

Illustrations  of  Effect  of  Correlation  Between  the  Independent 
Variables  on  Speed  of  Convergence 

Although  Bean  and  Ezekiel  have  always  considered  the  use  of  drift  lines 

an  integral  part  of  the  graphic  method,  it  is  of  interest  to  investigate  what 

1'6'/  Using  the  size  of  the  original  error  (that  is  b^g  g  -  b-^g^g)  as  a  base, 
it  can  be  stated  from  equation  (5)  that  the  percent  error  left#after  the  nth 
iteration  is  given  by  r 
after  one  iteration  is 
tions  is  0.16  percent  while  if  rgg  =  .9,  the  error  remaining  after  one  itera¬ 
tion  is  81  percent  and  after  two  iterations  is  65.61  percent. 


times  100.  Thus,  if  rgg  =  .2,  the  error  remaining 
4  percent  of  the  original  error  and  after  two  itera- 
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might  happen  in  a  particular  example  if  drift,  lines  were  not  used  and  com¬ 
pletely  arbitrary  values  were  chosen  as  first  approximations  to  b^g^.  This 
has  been  done  in  figure  2.  For  purposes  of  illustration,  a  line  known  to  hav- 
a  slope  materially  different  from  the  mathematically  calculated  b-^g  ~  was 
drawn  on  the  first  chart  in  each  example  and  the  problems  were  then  analyzed 
by  the  graphic  method  of  successive  approximation  discussed  above. 

These  examples  should  not  be  confused  with  the  example  in  figure  1, 
which  was  handled  by  the  usual  graphic  correlation  technique.  Fairly  accurat 
results  would  have  been  gotten  from  each  of  the  examples  in  figure  2  if  drift 
lines  had  been  used.  They  simply  illustrate  the  results  which  may  be  obtaine 

if  drift  lines  are  not  used  or  if  reliable  drift  lines  are  not  available. 

Two  examples  are  shown  in  figure  2.  17/  For  one,  rgg  =  .19;  for  the 
other,  rgg  =  -.91.  When  r^  -  .19,  the  second  approximation  to  b-^0  coin¬ 
cides  almost  exactly  with  the  mathematically  calculated  b-^g  g.  But  when 
rgg  =  -.91  the  third  approximation  is  still  much  closer  to  the  first  approxi¬ 
mation  than  to  the  correct  bn  9  The  same  is  true  of  the  approximations  to 

^15.2*  a  c^sckj  the  mathematical  method  of  successive  approximation  has 

been  applied  to  these  problems,  using  the  same  arbitrary  first  approximations 
to  b, 9  ~ .  Table  2  shows  the  results  of  the  analyses. 

The  above  discussion  of  successive  approximation  throws  light  on 
several  controversies  that  have  arisen  regarding  graphic  correlation.  Certai 
users  of  the  method  have  assumed  that  the  first  approximation  to  the  line 
with  slope  b-|9<g  could  be  drawn  in  without  reference  to  the  slopes  of  the 
drift  lines.  In  case  the  correlation  between  the  independent  variables  is 
low,  this  is  permissible  since  the  second  approximation  will  correct  most  of 
17 /  The  data  for  these  problems  are  given  in  the  Appendix. 
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X2 


Xz 


PROBLEM  IN  WHICH  r23=-.91 


U.  S.  DEPARTMENT  OF  AGRICULTURE 


NEG.  38823  BUREAU  OF  AGRICULTURAL  .ECONOMICS 


Figure  2.-  Graphic  illustrations  of  speed  of  convergence  when 
drift  lines  are  not  used. 
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the  error  in  the  firsts  however ,  if  the  correlation  between  the  independent 
variables  is  high,  drawing  in  the  first  approximation  without  reference  to  the 
drift  lines  may  lead  to  serious  errors »  In  the  first  place,  if  the  analyst 


Table  2,~  Values  of  mathematically  calculated  successive  approximations 
to  the  partial  regressions  for  a  three -variable  problem, 
when  arbitrary  values  are  chc&enfor  -u  (1) 

12.3 


Approxima- 

When  r0„  = 
2o 

.19 

'  Vi/hen  rg3  = 

-.91 

tion 

b12.5 

b13.2 

:  b12.3  : 

b13.2 

1st 

-  1.50 

-  .47 

-  1.50 

.17 

2nd 

.86 

-  .91 

-  1.23 

.53 

3rd 

.95 

-  .93 

-  1.00 

.83 

4th 

•  96 

-  .93 

-  .81 

1.07 

5th 

.56 

-  .93 

-  .65 

1.28 

10th 

_ - 

-  .19 

1.90 

15th 

— 

— 

-  .00 

2.15 

20th 

— 

— 

.07 

2.24 

25th 

— 

— 

.10 

2.28 

30th 

— 

— 

.11 

2.30 

35th 

— 

.12 

2.50 

39th 

Mat  he  mat  i  c  a  1 1  y 
calculated 
regression 

.12 

2.31 

coefficients 

.96 

_  Q  *7 

O  u  o 

.12 

2.31 

does  not  employ  drift  lines,  he  is  likely  to  make  his  first  approximation  to 


^12.3  aPPro^ma;te^y  equal  to  b^ 
may  differ  materially  from  b-^. 


o  It  has  already  been  mentioned  that  b^  ^ 
particularly  if  r9„  is  large.  Secondly,  if 


the  intercorrelation  is  high,  the  second  approximation  may  be  about  the  sane 


as  the  first,  so  that  the  analyst  may  think  he  has  reached  a  stable  result 
when  the  regression  still  has  a  slope  considerably  different  from  b^0  As 
the  graphic  method  gives  little  indication  of  the  size  of  r^,  it  is  advisable 
to  always  draw  in  as  many  drift  lines  as  possible. 

On  the  other  hand,  some  have  assumed  that  the  drift  lines  gave  them 


such  a  close  approximation  to  the  correct  regression  that  they  have  not  used 
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the  successive  approximation  procedure,  particularly  if  they  were  only  making 

a  rough  analysis  to  be  verified  later  by  the  mathematical  method.  This  also 

. 

appears  to  be  objectionable,  especially  if  the  drift  lines  fluctuate  con¬ 
siderably  or  if  only  one  or  two  drift  lines  can  be  drawn. 

In  general,  the  chances  for  mechanical  error  are  greater  in  the  graphi* 
than  in  the  mathematical  method  and  for  this  reason  the  safeguards  of  both 
drift  lines  and  successive  approximations  are  advisable.  (Bean  and  Ezekiel 
have  always  considered  them  an  integral  part  of  the  method.)  If  the  wrong  re¬ 
gressions  are  obtained,  the  scatter  in  the  final  chart  may  be  so  large  that 
the  particular  analysis  will  be  discarded,  whereas  if  the  correct  or  approxi¬ 
mately  correct  regressions  had  been  obtained  the  deviations  from  the  regres¬ 
sion  in  the  final  chart  would  have  been  smaller  and  the  analysis  would  have 
been  used  or  at  least  worked  out  mathematically. 

The  discussion  so  far  also  throws  light  on  at  least  some  of  the  ef¬ 
fects  of  correlation  betvreen  the  independent  variables  .on  graphic  correlation 
analyses.  It  has  sometimes  been  thought  that  the  partial  regressions  were 
completely  indeterminate  by  the  graphic  method  if  the  correlation  betvreen  the 
independent  variables  was  high.  From  the  above  it  can  be  seen  that  the 
indeterminateness  may  be  due  entirely  to  the  fact  that  for  high  correlations 
between  the  independent  variables  the  speed  of  convergence  is  slow.  With  slav 
convergence  the  eye  may  not  detect  minute  changes  which  should  be  made  in  the 
regressions  in  order  to  reduce  the  sum  of  squares  of  the  deviations  to  a 
minimum  and,  consequently,  the  process  of  successive  approximation  may. be 
stopped  before  the  correct  results  have  been  obtained. 

Effect  of  Not  Passing  Regressions  Through  the  Means 
One  further  point  needs  to  be  considered  before  the  subject  of  regres¬ 


sions  is  left 


19  - 


In  Bean’s  description  of  the  graphic  method  no  mention  was  made  of 

drawing  the  rearer.- icmi  through  the  m.ans  of  the  variables.  In  his  original 
article.  Bean  states;  "At  this  point  it  may  be  observed  that  the  arbitrary 
placing  of  the  approximation  curves  without  reference  to  the  average  values 
of  and  of  the  other  variables  does  not  affect  the  values  of  computed 
from  the  curves.  For  example,  had  the  approximation  curve  in  section  1  been 
placed  higher,  the  residuals  in  sections  2  and  3  would  have  been  correspond¬ 
ingly  decreased  and  the  curves  lowered."  18/  F.  L.  Thomsen,  on  the  other 
hand,  suggests  passing  the  regressions  through  the  means  of  the  variables  in 
order  that  each  regression  will  indicate  the  true  net  relation  between  each 
of  the  independent  variables  and  the  dependent  variable  in  the  analysis.  19/ 
It  can  be  shown  algebraically  that  if  a  line  having  a  slope  equal  to 
b-^9  g  is  drawn  in  the  first  chart  so  that  it  passes  through  a  point  d  units 
above  the  mean  of  X-^  at  the  mean  of  Xg,  and  if  deviations  from  this  line  are 
plotted  against  X3  in  a  second  chart,  the  line  which  will  reduce  the  sum  of 
the  squares  of  the  deviations  to  a  minimum  in  this  chart  will  have  a  slope 
equal  to  b-^g  g  and  will  pass  through  a  point  d  units  below  the  zero  line  at 
the  mean  of  Xg .  Moreover,  the  scatter  about  this  regression  will  be  the  same 
as  the  scatter  would  have  been  if  the  first  regression  had  been  passed  through 
the  means  of  X^  and  Xg  and  the  second  through  the  zero  line  and  the  mean  of 
Xg.  20/  Thus,  so  far  as  the  graphic  method  itself  is  concerned,  it  makes 
little  difference  whether  the  regression  in  the  first  chart  is  drawn  through 
the  means  or  not.  It  may  be  preferable,  however,  in  the  graphic  method,  to 
pass  the  lines  through  the  means,  for  when  this  is  done  the  slope  in  the 
second  chant  is  the  only  constant  to  be  determined  and  no  error  is  introduced 
in  this  chart  in  locating  the  intercept. 

H T  Bean,  Op.  cit.,  p.  393. 

19/  Thomsen,  F.  L.  Op.  cit.,  p.  229. 

See  note  3  in  the  Appendix. 
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Step  6.  Indicating  the  multiple  correlation  21/ 

The  scatter  about  the  regression  line  in  chart  B,  figure  1,  represents 
the  residual  variation  in  Xp  which  has  not  been  accounted  for  by  7^  and  Xg. 

It  has  been  assumed  by  some  that  this  final  scatter  is  a  direct  measure  of 
the  multiple  correlation.  This  is  true  only  when  the  variation  is  related 
to  the  variation  in  Xp  around  its  mean.  But  to  compare  mentally  the  total 
fluctuation  in  X  with  the  variation  remaining  in  section  B  is  a  difficult 
and  untrustworthy  method.  The  best  method  for  obtaining  an  estimate  of  the 
multiple  correlation  coefficient  is  to  calculate  it  from  the  formula 

R1 . 23  =  1  '  ld1.23 

-  fXp)2n  (8) 

in  which  d  equals  the  deviation  of  any  point  from  the  final  ro.gression  in 
section  B,  Xp  equals  the  mean  of  and  n  equals  the  number  of  observations 
in  the  sample. 

The  multiple  correlation  may  also  be  indicated  graphically  by  con¬ 
structing  a  chart  in  which  the  actual  va.lues  of  Xp  are  plotted  against  the 
estimated  values  of  X^  obtained  from  the  net  relationships  .between  Xp  and 
X0  and  Xg .  Id  any  of  the  published  graphic  correlation  studies  have  included 
charts  in  which  the  two  series  are  plotted  in  the  form  of  a  time  series 

.V :  o,  * 

(section  A,  figure  3).  A  somewhat  more  reliable  visual  estimate  of  the 
multiple  correlation  may  be  gotten  (bom  a  scatter  diagram  in  which  the  actual 
value  of  X1  is  plotted  against  the  estimated  value  of  Xp  around  a  line  drawn 
at  a  45  degree  angle  through  the  origin  (section  B).  The  multiple  correlation 
Rp  23  can  de  calculated  from  this  scatter  by  the  usual  formula  for  a  simple 

2l/  Most  of  the"  material  in  this  and  the  following  section,  including  equatior 
T5),  is  given  in  Ezekiel,  Op.  cit. 
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correlation,  using  and  the  estimated  value  of  Xj  as  the  variables.  This, 
however,  involves  somewhat  more  work  than  the  use  of  the  formula  given  above, 

Obtaining  the  estimated  values  of 

The  ma the  -tic.al  formula  for  obtaining  the  estimated  value  of  Xp  for 
any  observation  is 

\  +  b12.3  (X2  -  X2)  ♦  b13-2  (X3  -  X3).  (S) 

In  order  to  obtain  the  estimated  value  of  Xp  for  any  observation,  say 
Xpp,  graphically,  the  vertical  distance  from  the  zero  line  in  section  B, 
Figure  1,  to  the  final  regression  line  is  obtained  at  the  point  Xgp.  This 
distance,  which  equals  bp-?  g  (Xgp  -  Xg ) ,  is  added  to  the  vertical  distance  in 
section  A  from  the  point  X  on  the  X„  axis  to  the  final  regression  line  in 
this  chart.  Since  the  latter  distance  equals  Xp  +  bpg  g  (Xgp  -  Xg),  the  sum 
of  the  two  distances  equals  the  estimated  value  of  Xp-  as  can  be  seen  from 
equation  (9).  These  principles  are  illustrated  graphically  in  figure  4» 

Speed  of  Convergence  of  Multiple  Correlation  22/ 

As  was  pointed  out  above,  if  the  correlation  between  the  independent 
variables  is  high,  the  speed  of  convergence  for  the  regressions  is  reduced 
and  regressions  considerably  different  from  the  mathematically  calculated 
ones  may  be  accepted  as  the  final  regressions.  Some  factors  which  appear  to 
affect  the  speed  of  convergence  of  the  multiple  correlation  coefficient  are 
discussed  here. 

It  has  been  pointed  out  that  the  regressions  which  will  make  the  sum 
of  squares  of  the  deviations  from  the  final  regression  in  the  second  chart  a 
minimum  are  the  mathematically  calculated  partial  regressions.  From  equation 

22/  The  general  principles  discussed  here  have  been  presented  by  others. 

■See,  fbr  instance,  Bean,  Ezekiel,  Kalenbaum,  and  Black.  The  use  of  the 
Short-cut  Graphic  method  of  multiple  correlation.  Comments.  Quarterly 
Journal  of  Economics.  February  1940.  p.  336. 
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section  A 


SECTION  B 


U.  S.  DEPARTMENT  Of  AGRICULTURE 


Xl  <  ESTIMATED  ) 

NEG.  38824  B  U  R  EAU  OF  AG  RICULTU  RAL  ECONOMICS 


Figure  3.-  Graphic  methods  for  representing  the  multiple  correlation. 


X2  Xz 

U.  S.  DEPARTM  ENT  OF  AGRICULTU  RE  -  NEG.  38825  BUREAU  OF  AGRICULTURAL  ECONOMICS 

Figure  4.-  Graphic  method  for  determining  estimated  values  of  X^, 
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(8)  it  is  evident  that  the  size  of  the  multiple  correlation  coefficient 
depends  upon  the  size  of  the  deviations  from  the  final  regression.  Thus, 
the  computed  multiple  correlation  will.,  depend  on  the  regressions  obtained, 
and  if  the  regressions  are  inaccurate  the  computed  multiple  correlation 
coefficient  will  be  inaccurate.  The  problem  to  be  determined  is  how  large 
this  inaccuracy  can  be  expected  to  be  and  what  determines  the  size  of  the 
error . 

In  order  to  give  a  partial  answer  to  the  above  questions,  the  multiple 
correlation  coefficients  have  been  computed  from  the  data  used  in  illustrating 
successive  approximations  to  the  regressions  (table  2  and  table  1  in  the 
appendix).  Each  of  the  three  methods  of  computation  which  have  been  outlined 
for  obtaining  the  multiple  correlation  coefficient  has  been  used,  based  on 
the  mathematically  calculated  regressions  and  on  the  first  four  approxima¬ 
tions  to  the  partial  regressions  obtained  by  the  mathematical  method  of 
successive  approximation.  The  results  are  given  in  table  3. 

Table  3.-  Successive  approximations  to  the  multiple  correlation  coefficient 


Approximation  l/ 

Problem 

in  which  r2g 

=  .19 

Problem 

r23 

in  which 

-  -.91 

Method  A 

:  Method  B  ; 

Method  C 

Me  tho  d  A :  Me  tho  d  B  s 

Method  C 

Hi 

Undefined 

-.308 

Undefined 

.886 

.902 

.982 

r2 

.804 

.805 

.782 

.919 

.926 

.983 

R3 

.806 

.806 

,  805 

.941 

.945 

.984 

R4 

.806 

.806 

.806 

.956 

.958 

.985 

Mathematically 

computed  R 

.806 

.806 

.806 

.990 

.990 

.990 

1/  Rn  equals  the  multiple  correlation 'coefl'icien'-fc  based  on  the  n1  th  approxima¬ 
tion  to  b-^2  g  and  b-^  2. 


In  the  following  formulas,  the  variables  are  taken  as  deviations  from 
their  respective  means. 
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2  2 

Method  A.  -  Use  of  the  formula,  R^  =  1  -  j£d£,  in  which  the  deviations  have 


iLxi 


been  computed  by  use  of  the  formula, 


dn  :  *1  -  b12?3 


1  "  u12 . 3  x2  “  b13.2  x^ 


M 


Method  B.-  Calculating  the  simple  correlation  between  x-^  and  the  estimated 
value  for  ,  the  latter  being  obtained  by  use  of  the  formula, 

Est.  xx  =  b^l  x2  +  b x3. 

Method  C.  -  Use  of  the  formula, 

:  bl^i.xl  x2  +  b13?2Exl  x3 


*-A 


(This  formula  would  not  be  used  in  working  with  the  graphic  method 
but  is  frequently  used  in  the  mathematical  method.) 

The  conclusions  to  be  drawn  are;  (1)  Some  error  will  be  made  in  the 

multiple  correlation  coefficient,  if  some  error  is  made  in  the  final  approxi 

mations  to  the  partial  regressions,  (2)  the  computed  multiple  correlation 

coefficients  seem  to  be  different,  dependifjg  on  which  formula  or  method  is 

used  for  computing  them,  (if  the  mathematically  computed  regressions  are 

used,  then  the  same  results  will  be  gotten  by  each  method)  and  (3)  the  error 

seems  to  vary  directly  with  the  error  made  in  estimating  the  regressions  and 

inversely  with  the  size  of  the  correlation  between  the  independent  variables 

Step  7 .  Interpreting  the  Correlations  Indicated  in  the  Scatter  Diagrams 

In  general,  the  regression  lines  obtained  in  the  several  charts  of 

the  Bean  method  have  been  interpreted  correctly  as  "net",  that  is,  partial, 

regressions  between  the  dependent  variable  and  the  separate  independent 

variables.  Most  of  the  confusion  has  come  in  interpreting  the  correlations 
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indicated  by  the  plotted  observations  in  the  scatter  diagrams.  This  point 
can  be  cleared  up  if  one  is  careful  to  note  the  exact  meaning  of  each  of 
the  two  variables  represented  by  the  horizontal  and  vertical  scales  of  the 
charts,  and  considers  the  "visually  indicated"  correlation  to  be  the  simple 
correlation  of  those  two  variables. 

In  the  first  chart,  (Section  A,  figure  1)  the  two  variables  represented 
by  the  vertical. and  horizontal  scales  are  simply  the  dependent  variable  and 
one  of  the  independent  variables,  Xg*  Hence,  this  chart  indicates  the  simple 
correlation,  r-^g. 

With  respect  to  the  second  chart  (figure  1,  Section  B),  if  ”bpp  3X2 
is  considered  as  a  variable,  V  ,  and  the  simple  correlation  between  V-^  and 
X  is  obtained,  the  resulting  correlation  will  be  equal  to  the  part  correlation 

O 

13r2,  as  defined  by  Ezekiel.  23/  In  this  sense  we  can  say  that  the  second 
chart  indicates  the  part  correlation  13r2.  Likewise,  if  X^  -  b^g  gXg  is 
considered  as  another  variable,  Vg,  and  the  simple  correlation  between  Vg 
and  Xg  is  obtained,  that  correlation  will  be  equal  to  the  part  correlation 
12r3.  If  Xg  were  used  as  the  second  independent  variable  instead  of  X3 , 
the  second  chart  would  then  indicate  12r3.  This  explains  why  different 
correlations  are  indicated  in  the  final  charts  when  the  order  in  which  the 


23/  Ezekiel .  Op .  c i  t~T  p.  181-183  . 

By  making  certain  substitutions  in  Ezekiel's  formula  for  part 
correlation,  it  can  be  shown  that 


13r22 


2 

r  13.2 


1 


(1 


r  13.2 


r2 

23 


Since,  in  the  denominator  of  this  formula,  the  quantity  1  -  r^  is  non- 

lu  1  C 

negative  and  less  than  unity,  the  part  correlation  between  two  variables  is 
always  greater  than  or  at  most  equal  to  the  partial  correlation  between  the 
same  variables  and  the  difference  increases  as  rgj  increases.  * 
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variables  are  used. -is  changed.  However,  part  correlations  do  not  appear 
to  have  much  meaning  in  the  interpretation  of  an  actual  problem. 

Deviations  from  the  Regressions 

Some  investigators  have  been  puzzled  by  the  fact  that  the  deviations 

from  the  regression  lines  in  certain  charts  are  exactly  equal.  In  the 

problem  outlined  here,  the  deviation  for  any  particular  observation  of  V-j 

from  a  line  with  slope  b  in  section  B,  figure  1  would  be  exactly  equal 

13«2 

to  the  deviation  of  X^  from  the  regression  in  section  B,  figure  3.  jp-so > 
if  X,  had  been  plotted  against  X„,  a  line  with  slope  h-,  „  „ ,  inserted,  devia- 

X  £)  i(J|6 

tions  plotted  against  X^,  and  the  regression  line  with  slope  b-j^  ^  drawn 
in,  the  deviation  from  this  line  for  any  observation  would  be  exactly  equal 
to  the  deviation  obtained  in  section  B  in  either  figure  1  or  figure  3  for 
that  observation.  Thus,  for  example,  if  the  mathematically  calculated 
values  for  b^  ^  and  b^  ^  had  been  obtained,  the  three  quantities,  d-^, 
dg  and  d^  given  in  figure  1,  section  A  and  B  and  figure  3,  section  B, 
would  be  numerically  equal.  This  follows  from  the  fact  that  the  deviation 
for  the  ith  observation  in  each  of  these  charts  is  given  by 


dli.  '  li  "  "  b13.2X31 


(10) 
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APPENDIX 


Table  1.-  Data  for  figure  2 


Obser- 

Problem  in 

which 

r23 

r  .19 

Problem  in 

which 

r23  = 

-.91 

vat  ion 

X!  | 

X2 

X3 

X!  j 

X2 

X3 

1 

23.2 

5.7 

3.9 

16 

18 

8 

2 

3.6 

6.6 

24.0 

12 

21 

6 

3 

13.2 

13.9 

13.6 

4 

27 

2 

4 

21.5 

17.8 

14.1 

36 

9 

18 

5 

20.0 

13.7 

8.1 

32 

9 

14 

6 

25.7 

14.7 

14.3 

20 

8 

10 

7 

16.9 

13.9 

11.5 

4 

20 

4 

8 

14.0 

12.6 

13.4 

1 

25 

1 

9 

23.6 

13.0 

13.5 

31 

1 

16 

10 

23.7 

13.5 

11.0 

24 

12 

11 

11 

9.0 

9.3 

14.8 

12 

7.7 

5.2 

12.0 

13 

14.7 

2.2 

6.5 

Mean 

16.7 

10.9 

12.4 

18 

15 

9 

Mathe¬ 
matical¬ 
ly  cal¬ 
culated 

bl2.3 

- 

.956 

b12.3 

- 

.122 

regres¬ 
sion  co¬ 
effi¬ 
cients 

b13.2 

.927 

b12.3 

2.307 
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MATHEMATICAL  APPENDIX 


In  this  appendix  the  following  symbols,  which  may  be  new  to  some 
readers,  will  be  used. 


ai,  =  £(XX  -  XP  (Xg  -  Xg)  ,  a13  =  £(XX  -  Xx)  (X3  -  X3)  ,  etc. 
n  -  1  n  -  1 

all  =i(Xi  -  XX)2,  a23  =i'(X2  -  X2)p,  etc. 
n  -  1  n  -  1 


where  X]_  is  the  mean  of  X^,  etc.  and  n  is  the  number  of  observations  in  the 
sample . 

The  following  transformations  can  be  made. 

a12  =  S1  s2  r12,  a13  =  S1  s3  r13>  etc. 

2  2 

all  =  S1  »  a22  =  s2  >  etc- 

where  sp  is  the  standard  deviation  of  X]_  ,  etc. 

It  can  be  shown  that  in  terms  of  the  standard  deviations  and 
correlations 


b12.3 


si 

s2 


rl2  -  ri3  r23 


a) 


and 


^13.2  =  fl  r13  -  r12  r23 
s3  1  “  r23 


(2) 
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Note  1.  It  is'  desired  to  determine  the  simple  regression  coefficient  of 

irl  on  X3  when  Vx  =(XX  -  Xq)  -  b12>3  (X2  -  X2)  . 

Now  bVix3  =  > vj (Xg  -  X3) _ =  £[(xx  -  Xx)  -  blg>3(X2  -  x2)]  (Xg  -  Xg) 

(Xg  -  x3)2  Jr.  (x3  -  x3)3 

=  a13  ~  b12.3  a23 

a33 

Substituting  the  value  of  bX2>3  from  equation  (l)  and  simplifying, 

bV?  =  S1  r13  -  r12  r23 

s3  1  "  r23 

.  .  By  equation  (2), 

\X3  =  b13.2 

Note  2.  Iterative  Process 

In  the  method  of  least  squares,  the  coefficients  ^12.34,  bX3  2^  and 
b14.23  are  determined  by  minimizing  the  quantity 

f(b12.34>  b13.24>  b14.23)  =  [(xl  ~  xl)  -  b12.34(x2  “  x2^  ~  b13.24(x3  ~  x3^ 
-  b14.23  <X4  -  x4)]2 

The  solution  yields  the  following  three  normal  equations. 

bl2.34  a22  +b13.24  a23  +  b14.23  a24  =  a12 

b12.34  a23  +b13.24  a33  +b14.23  a34  =  a13 

b12.34  a24  +b13.24  a34  +b14.23  a44  =  a14 

These  equations  can  be  solved  by  well  known  methods. 

(1)  (1) 

In  the  iterative  process,  values  bXP  3^  and  bX3  24  are  guessed  for 


b12  34  anci-  b13  24  respectively  in  equation  (3)  and  a  solution  for  ^14^3,3 
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(1) 


(1) 


(say  bi)  o")  is  obtained  from  this  equation.  Then  b,  '  and  b, are  sub- 

x  *  Iti:  %  t'jO/  ■  J.  £  •  J 


(1) 

'14  23 


stituted  in  equation  (4)  for  b^  ^  and  ^  respectively  and  a  second  ap 

proximation  for  b  (say  b  \  i  )  is  obtained.  The  values  b.,  \  r, ,  and 

b1^23  &re  S1’-bstitu.ted  in  equation  (5 )  for 

a  second  approximation  to  b  (say  b  )  is  obtained.  The  values  b 

1  d  •  dL.z  "  13. 24  12.  o4 

and  b1\"‘Jc,.  are  substituted  in  equation  (o)  for  b.  _  and  b-,  *  0,  respectively 
Id.  d.:  id*  d4  -  d  •  ia 

and  a  second  approximation  to  b, . 

14. 

is  repeated  until  the  coefficients  converge  to  stable  values. 


12.34 
respectively  and 
(2) 


( 2 ) 

(say  b^  )  is  obtained.  This  process 


The  iterative  process  outlined  above  is  eaui valent  to  the  following 


steps:  (1)  Assign  values  b,  and  b-,^'L;j.  to  b,  „  „„  and  b.  „  .  respectively 

ii.de.  .LcJ.i'i-  i.C*d4  ld.C“t 

in  the  function  f (in  0  „  ,  b  ,  b  .  defined  above.  Find  that  value  of 

I  c  •  o4t  id.  dl t  14 . 2  d  / 


(1) 


b14.2S  ,!hich  mk9t  Khz.k’  hi^.4-  b14.83> 


a  minimum.  Let  it  be  b 
(1)  .  h  (1) 


(1) 

14.23 


(2)  Find  that  value  of  ,Z4:  v>rhich  mlces  -0~l2  ,4  >  b^'^,  b^ '3)  a 

(o) 

minimum.  Let  this  value  be  b,^  4 . .  ( 3  j  Find  that  value  of  b.  „  which 

13.24 


/p  \  »  X  ' 

makes  b13,24»  ^1^*2 3 ^  a  Let  this  value  be  b 


'1) 


(2) 

13.24' 


minimum.  Let  this  value  be  b. 


(2)  h  <2/ 

2. £4’  D1S. 2-: 

,  etc.  It  mil  be  seen  that  the  steps  in- 


(4)  Find  that  value  of  b^^  which  makes  f(b^  J..,  b^^  ,  b^^)  a 

(2) 


'14.231 

volved  in  the  latter  process  are  identical  with  those  outlined  on  pages  10-12 
for  the  graphic  method  involving  three  variables. 

Speed  of  Convergence 

The  problem  of  the  speed  of  convergence  will  be  considered  for  three 
variables  only.  The  normal  equations  for  three  variables  are  given  by 

(6) 

(7) 


b-j  P  tv 
j~  &  •  0 

a22 

+  hs.2 

a23 

ai2 

b12.3 

a23 

+  b15.2 

a33 

X  KJ 

If  the 

iterative 

process 

is 

nth  approximation  to  b,0  and  b.  _  respectively  can  be  shown  to  be  equal  to 

ic-.d  Id. 2 
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b12?3  =  !i  <r,2  -  rl/23  +  rl2rL  '  ri/L+-”*  '  r!3r 


2n-3'v.,(l)2n-2 


13  23 


)  +  b 


12.3r23 


(8) 


and 


2  _  „3  ,  ,  „  „2n  -2  s0  n.  (1)  „  2n  -  1 


bl^  -  l1-  <rl3  -  ri2r23  +  riSr23  -  r].2r2"3  +_” '  +  hsbs 


>  -  I£  b12  s  r23 

S  rg  (9) 


=  !i  (r13  -  r12r23  +  ri3r23  -  PpP,  +-”--  r12r232n  ‘  3)  +  b1.^2r232n  ‘  2 


.3  23  12  23 


13.2  23 


(10) 


v/her e  b.,^11'  and  are  the  nth  and  the  1st  approximations  resoectivelv  to 

12.o  12 ,3 


(n) 


bn  9  „  and  b,  and  b  are  the  nth  and  the  1st  approximations  respectively 

mO  lO  #6  lO  #2 


to  b 


13.2 


But 
b 


12.3  =  fi  <r12  *  r13r23  +  r12r23  "  +  ‘  "* } 


(11) 


and 
b 


13.2 


,  .2  O  ,  v 

=  1  (r  -  r  r  +rr  -  r  r  +  -  ...) 

—  13  12  23  13  23  12  23 


(12) 


which  can  be  obtained  by  expanding  the  denominator  of 
b12.3  ~  S1  r12  -r13r23 


and 


1  -  rc  23 


(1) 


J13.2  sl_  r13  -  1  12^25 

1  - 

0  23 


rnnr, 


(2) 


in  an  infinite  series. 

Hence,  comparing  equations  (8)  with  (11)  and  (9)  or  (10)  with  (12)  it 
will  be  seen  that  b-^",'  and  b^a)  can  "be  ma8e  bo  approximate  b^  -2  and  b^  ^ 
respectively  as  closely  as  desired  by  talcing  n  sufficiently  large. 


The  difference  between  the  mathematically  calculated  value  of  „ 

(n)  2n  -  2  (l) 

and  the  nth  approximation  is  given  by  bqg,3  -  bq^  5  =  r,g  (^12.3 -  bqq.3)  (13) 
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fend  the  difference  between  the  mathematically  calculated  value  of  bqg^g  and  the 
nth  approximation  is  given  by 

(14) 


,  ,  (n)  _  so  2h  *■  1  /;  .  v  (1) 

bl3.2  ”  b13.2  — £  r23  (bl2^3  bl2. 

S3 


5j 


_  2n  -  2  ,  ,  (Ih 

r23  b13 .2  “  b13.2 


(15) 


It  is  evident  from  equation  (13)  that  the  speed  of  convergence  of  suc¬ 
cessive  approximations  to  g  varies  directly  with  the  error  made  in  bq^bl 
and  inversely  with  the  size  of  r9  Likewise  from  equation  (15),  the  speed 
of  convergence  of  successive  approximations  to  b, „  9  varies  directly  with  the 
error  made  in  b-^^g  and  inversely  with  the  size  of  r  gg 

Note  3.  It  is  desired  to  determine  the  effect  on  the  partial  regressions  and 
the  multiple  correlation  coefficient  of  passing  the  regressions  through  some 
points  other  than  those  determined  by  the  means. 

In  a  three  variable  problem,  the  regression  ecu  ation  is  given  by 

X1  =  X1  +b12.3  <X2  -X2>  +b13.2  <X3  -X3) 

Let  a  line  with  slope  equal  to  b19  be  drawn  in  the  first  chart  of 

§  O 

the  graphic  method.  Let  the  line  be  drawn  dq  units  above  the  mean  of  Xq  at 
the  point  (X^  X9).  This  line  is  then  given  by 

3 

xl  =  di  +  Xq  +  (x2  ”  x2^ 

/The  constant  dq  would  be  zero  if  the  line  passed  through  the  point  (Xq,  Xg  )7J 
Let  Tl  -  Xx  -  dl  -  -  t12>3(X2  -  f2) 


Then  in  the  second  chart  V  can  be  written  as  a  regression  on  X,,  in  the 

1  o 


form 


Vi  =  d2  +  c  (X3  -  Xg) 

If  the  least  squares  method  is  used  for  determining  dg  and  c,  it  will  be  found 
that  c  =  b^g  g  and  dg  =  -  dq. 

Thus  it  is  seen  that  if  the  line  with  slope  b, 9  „  is  passed  through  a 
point  dq  units  above  the  mean  of  Xq  at  the  point  (Xq  X^),  the  second 
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regression  will  have  a  slope  equal  to  b-j^  g  and  will  pass  through  a  point  d^ 
units  below  the  zero  line  at  the  point  (0,  Xn).  Thus 

V2  =  V1  -  C-  d  +tl3.2<X3  "  V-7 

-  (x1  -  £p  -  b12>3  (z2  -  x2)  -  a  +  d  -  bls>2  (x3  -  x3) 

•  (h  -  V  -  b12.3  <X2  -  V  ‘  b13.2  <X3  '  V  (16) 

This  is  the  same  as  would  have  been  obtained  if  the  regressions  had  been 
passed  through  the  means.  Therefore,  the  multiple  correlation  is  not  affected 
by  passing  the  regressions  through  points  other  than  those  determined  by  the 
means  • 


i 


